Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor oci-copy to be more efficient #1130

Merged
merged 3 commits into from
Jul 8, 2024
Merged

Refactor oci-copy to be more efficient #1130

merged 3 commits into from
Jul 8, 2024

Conversation

ralphbean
Copy link
Member

Originally, this task would download all artifacts requested in the input file, check them all, and then upload them all to the registry in one invocation of "oras push".

This had two problems. First, if "oras push" flaked out part way through and the user needed to retry their pipeline, the entire download section would need to be run again needlessly. Second, for extremely large artifacts with lots of medium-sized files, an enormous PVC would be needed to hold all of them between download and push to the registry.

The change here addresses both problems.

First, files are downloaded, checked, pushed to the registry and then deleted from local storage - one at a time. This obviates the need for a large volume to store all files at once, since only enough storage is needed to store one file, not all of them.

Second, as files are considered, first the registry is checked to see if the blob has already been pushed there. If it has, then skip the download step. This has the effect of greatly improving the runtime for artifacts where only one or two of many files have changed since the last taskrun.

@ralphbean ralphbean force-pushed the efficient-copy branch 2 times, most recently from 04b5919 to 1478c32 Compare July 8, 2024 13:24
Copy link
Contributor

@chmeliik chmeliik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice solution, if somewhat low-level

Just one nitpick for extracting the repo from the image ref (mainly for consistency with other tasks)

task/oci-copy/0.1/oci-copy.yaml Outdated Show resolved Hide resolved
ralphbean and others added 3 commits July 8, 2024 12:05
Originally, this task would download all artifacts requested in the
input file, check them all, and then upload them all to the registry in
one invocation of "oras push".

This had two problems. First, if "oras push" flaked out part way
through and the user needed to retry their pipeline, the entire download
section would need to be run again needlessly. Second, for extremely
large artifacts with lots of medium-sized files, an enormous PVC would
be needed to hold all of them between download and push to the registry.

The change here addresses both problems.

First, files are downloaded, checked, pushed to the registry and then
deleted from local storage - one at a time. This obviates the need for a
large volume to store all files at once, since only enough storage is
needed to store one file, not all of them.

Second, as files are considered, first the registry is checked to see if
the blob has already been pushed there. If it has, then skip the
download step. This has the effect of greatly improving the runtime for
artifacts where only one or two of many files have changed since the
last taskrun.
Theoretically, this works if the IMAGE reference contains a port number.

Co-authored-by: Adam Cmiel <[email protected]>
@ralphbean ralphbean added this pull request to the merge queue Jul 8, 2024
Merged via the queue into main with commit b70abf2 Jul 8, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants